6 research outputs found

    Evolving comprehensible and scalable solvers using CGP for solving some real-world inspired problems

    Get PDF
    My original contribution to knowledge is the application of Cartesian Genetic Programming to design some scalable and human-understandable metaheuristics automatically; those find some suitable solutions for real-world NP-hard and discrete problems. This technique is thought to possess the ability to raise the generality of a problem-solving process, allowing some supervised machine learning tasks and being able to evolve non-deterministic algorithms. \\ Two extensions of Cartesian Genetic Programming are presented. Iterative My original contribution to knowledge is the application of Cartesian Genetic Programming to design some scalable and human-understandable metaheuristics automatically; those find some suitable solutions for real-world NP-hard and discrete problems. This technique is thought to possess the ability to raise the generality of a problem-solving process, allowing some supervised machine learning tasks and being able to evolve non-deterministic algorithms. \\ Two extensions of Cartesian Genetic Programming are presented. Iterative Cartesian Genetic Programming can encode loops and nested loop with their termination criteria, making susceptible to evolutionary modification the whole programming construct. This newly developed extension and its application to metaheuristics are demonstrated to discover effective solvers for NP-hard and discrete problems. This thesis also extends Cartesian Genetic Programming and Iterative Cartesian Genetic Programming to adapt a hyper-heuristic reproductive operator at the same time of exploring the automatic design space. It is demonstrated the exploration of an automated design space can be improved when specific types of active and non-active genes are mutated. \\ A series of rigorous empirical investigations demonstrate that lowering the comprehension barrier of automatically designed algorithms can help communicating and identifying an effective and ineffective pattern of primitives. The complete evolution of loops and nested loops without imposing a hard limit on the number of recursive calls is shown to broaden the automatic design space. Finally, it is argued the capability of a learning objective function to assess the scalable potential of a generated algorithm can be beneficial to a generative hyper-heuristic

    Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD.

    Get PDF
    Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers' ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture ("resources") for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (https://isglobal-brge.github.io/resource_bookdown)

    Tutorials at PPSN 2016

    Get PDF
    PPSN 2016 hosts a total number of 16 tutorials covering a broad range of current research in evolutionary computation. The tutorials range from introductory to advanced and specialized but can all be attended without prior requirements. All PPSN attendees are cordially invited to take this opportunity to learn about ongoing research activities in our field

    Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD

    Get PDF
    Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers' ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture (" resources ") for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (). Data sharing enhances understanding of research results beyond what is possible from any single study. Data pooling across multiple studies increases statistical power and allows exploration of between-study heterogeneity. But, considerations related to ethico-legal and intellectual/commercial value regularly prevent or impede physical data sharing. DataSHIELD is designed to circumvent this problem. However, despite the growing confidence users have been placing in DataSHIELD to perform privacy-protected analyses of data in cohort consortia, there are real challenges to federated analytics. They include considering the wide range of data formats, and big data sources used, for example, in 'omics-based research. This article describes the development and implementation of the new "resources" architecture in DataSHIELD that overcomes this limitation. We illustrate its value with real world examples related to genomics and geographical data. We also demonstrate how genomic data sharing initiatives such as GA4GH and EGA can benefit directly from our development. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects

    Epimutation detection in the clinical context: guidelines and a use case from a new Bioconductor package

    No full text
    Epimutations are rare alterations of the normal DNA methylation pattern at specific loci, which can lead to rare diseases. Methylation microarrays enable genome-wide epimutation detection, but technical limitations prevent their use in clinical settings: methods applied to rare diseases’ data cannot be easily incorporated to standard analyses pipelines, while epimutation methods implemented in R packages (ramr) have not been validated for rare diseases. We have developed epimutacions, a Bioconductor package (https://bioconductor.org/packages/release/bioc/html/epimutacions.html). epimutacions implements two previously reported methods and four new statistical approaches to detect epimutations, along with functions to annotate and visualize epimutations. Additionally, we have developed an user-friendly Shiny app to facilitate epimutations detection (https://github.com/isglobal-brge/epimutacionsShiny) to non-bioinformatician users. We first compared the performance of epimutacions and ramr packages using three public datasets with experimentally validated epimutations. Methods in epimutacions had a high performance at low sample sizes and outperformed methods in ramr. Second, we used two general population children cohorts (INMA and HELIX) to determine the technical and biological factors that affect epimutations detection, providing guidelines on how designing the experiments or preprocessing the data. In these cohorts, most epimutations did not correlate with detectable regional gene expression changes. Finally, we exemplified how epimutacions can be used in a clinical context. We run epimutacions in a cohort of children with autism disorder and identified novel recurrent epimutations in candidate genes for autism. Overall, we present epimutacions a new Bioconductor package for incorporating epimutations detection to rare disease diagnosis and provide guidelines for the design and data analyses
    corecore